Data Set Editing by Ordered Projection

نویسندگان

  • Jesús S. Aguilar-Ruiz
  • José Cristóbal Riquelme Santos
  • Miguel Toro
چکیده

1 Department of Computer Science. University of Seville. Spain. {aguilar, riquelme, mtoro}@lsi.us.es Abstract. In this paper, an editing algorithm based on the projection of the examples in each dimension is presented. The algorithm, that we have called EOP (Editing by Ordered Projection) has some interesting characteristics: important reduction of the number of examples from the database; lower computational cost in respect of other typical algorithms due to the absence of distance calculations; conservation of the decision boundaries, especially from the point of view of the application of axis-parallel classifiers; reduction of the decision tree size or the number of decision rules. The performance of EOP is showed by comparing the results provided by C4.5 [5] before and after applying it on databases with continuous attributes. These experiments have been realised using some databases from UCI repository [1]. The use of EOP as preprocessing method for the later application of any axis-parallel learning algorithm convert it in a valuable tool in the field of data mining.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Two-Level Concept-Oriented Data Model

In this paper we describe a new approach to data modelling called the concept-oriented model (CoM). This model is based on the formalism of nested ordered sets which uses inclusion relation to produce hierarchical structure of sets and ordering relation to produce multi-dimensional structure among its elements. Nested ordered set is defined as an ordered set where an each element can be itself ...

متن کامل

A Measure for Data Set Editing by Ordered Projections

In this paper we study a measure, named weakness of an example, which allows us to establish the importance of an example to find representative patterns for the data set editing problem. Our approach consists in reducing the database size without losing information, using algorithm patterns by ordered projections. The idea is to relax the reduction factor with a new parameter, λ, removing all ...

متن کامل

Consistency in Real-time Collaborative Editing Systems Based on Partial Persistent Sequences

In real-time collaborative editing systems, users create a shared document by issuing insert, delete, and undo operations on their local replica anytime and anywhere. Data consistency issues arise due to concurrent editing conflicts. Traditional consistency models put restrictions on editing operations updating different portions of a shared document, which is unnecessary for many editing scena...

متن کامل

Restricting the parameter set of the Pascoletti-Serafini scalarization

‎A common approach to determine efficient solutions of a multiple objective optimization problem‎ ‎is reformulating it to a parameter dependent scalar optimization problem‎. ‎This reformulation is called scalarization approach‎. Here, a well-known scalarization approach named Pascoletti-Serafini scalarization is considered‎. First, some difficulties of this scalarization are discussed and then ...

متن کامل

Sampling of Multiple Variables Based on Partially Ordered Set Theory

We introduce a new method for ranked set sampling with multiple criteria. The method relaxes the restriction of selecting just one individual variable from each ranked set. Under the new method for ranking, units are ranked in sets based on linear extensions in partially order set theory with considering all variables simultaneously. Results willbe evaluated by a relatively extensive simulation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Intell. Data Anal.

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2000